Multi-Dimensional Dependency Grammar as Multigraph Description
نویسندگان
چکیده
Extensible Dependency Grammar (XDG) is new, modular grammar formalism for natural language. An XDG analysis is a multi-dimensional dependency graph, where each dimension represents a different aspect of natural language, e.g. syntactic function, predicate-argument structure, information structure etc. Thus, XDG brings together two recent trends in computational linguistics: the increased application of ideas from dependency grammar and the idea of multi-layered linguistic description. In this paper, we tackle one of the stumbling blocks of XDG so far—its incomplete formalization. We present the first complete formalization of XDG, as a description language for multigraphs based on simply typed lambda calculus. Introduction Extensible Dependency Grammar (XDG) (Debusmann et al. 2004) brings together two recent trends from computational linguistics: 1. dependency grammar 2. multi-layered linguistic description Firstly, the ideas of dependency grammar, lexicalization, the head-dependent asymmetry, valency etc., have become more and more popular in computational linguistics. Most of the popular grammar formalisms like Combinatorial Categorial Grammar (CCG) (Steedman 2000), Headdriven Phrase Structure Grammar (HPSG) (Pollard & Sag 1994), Lexical Functional Grammar (LFG) (Bresnan 2001) and Tree Adjoining Grammar (TAG) (Joshi 1987) have already adopted these ideas. Moreover, the most successful approaches statistical parsing crucially depend on notions from dependency grammar (Collins 1999), and new treebanks based on dependency grammar are being developed for various languages, e.g. the Prague Dependency Treebank (PDT) for Czech and the TiGer Dependency Bank for German. Secondly, many treebanks such as the Penn Treebank, the TiGer Treebank and the PDT are continuously being extended with additional layers of annotation in addition to the syntactic layer, i.e. they become more and more multilayered. For example, the PropBank (Kingsbury & Palmer Copyright c © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. 2002) (Penn Treebank), the SALSA project (Erk et al. 2003) (TiGer Treebank) and the tectogrammatical layer (PDT) add a layer of predicate-argument structure. Other added layers concern information structure (PDT) and discourse structure as in the Penn Discourse Treebank (Webber et al. 2005). These additional layers of annotation are often dependencylike, i.e. could be straightforwardly represented in a framework for dependency grammar which is multi-layered. XDG is such a framework. It has already been successfully applied to model a relational syntax-semantics interface (Debusmann et al. 2004) and to model the relation between prosodic structure and information structure in English (Debusmann, Postolache, & Traat 2005). We hope to soon be able to employ XDG to directly make use of the information contained in the new multi-layered treebanks, e.g. for the automatic induction of multi-layered grammars for parsing and generation. To achieve this goal, XDG still needs to overcome a number of weaknesses. The first is the lack of a polynomial parsing algorithm—so far, we only have a parser based on constraint programming (Debusmann, Duchier, & Niehren 2004), which is fairly efficient, given that the parsing problem is NP-hard, but does not scale up to large-scale grammars. The second major stumbling block of XDG so far is the lack of a complete formalization. The latter is what we will change in this paper: we will present a formalization of XDG as a description language for multigraphs based on simply typed lambda calculus (Church 1940; Andrews 2002). To give a hint of the expressivity of XDG, we additionally present a proof that the parsing problem of (unrestricted) XDG is NP-hard. We begin the paper with introducing the notion of multigraphs. Multigraphs Multigraphs are motivated by dependency grammar, and in particular by its structures: dependency graphs. Dependency Graphs Dependency graphs such as the one in Figure 1 typically represent the syntactic structure of sentences in natural language. They have the following properties: 1. Each node (round circle) is associated with a word (today, Peter, wants etc.), which is connected to the corresponding node by a dotted vertical line called projection edge,
منابع مشابه
Extensible dependency grammar: a modular grammar formalism based on multigraph description
What the thesis is about Extensible Dependency Grammar (XDG) new grammar formalism for natural language explores the combination of:
متن کاملModular Grammar Design with Typed Parametric Principles
This paper introduces a type system for Extensible Dependency Grammar (xdg) (Debusmann et al., 2004), a new, modular grammar formalism based on dependency grammar. As xdg is based on graph description, our emphasis is on capturing the notion of multigraph, a tuple of arbitrary many graphs sharing the same set of nodes. An xdg grammar consists of the stipulation of an extensible set of parametri...
متن کاملMulti-dimensional Graph Configuration for Natural Language Processing
Many tasks in computational linguistics can be regarded as configuration problems. In this paper, we introduce the notion of lexicalised multi-dimensional configuration problems (lmcps). This class of problems both has a wide range of linguistic applications, and can be solved in a straightforward way using state-of-the-art constraint programming technology. The paper falls into two main parts:...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملReconstructions of Deletions in a Dependency-based Description of Czech: Selected Issues
The goal of the present contribution is to put under scrutiny the language phenomenon commonly called ellipsis or deletion, especially from the point of view of its representation in the underlying syntactic level of a dependency based syntactic description. We first give a brief account of the treatment of ellipsis in some present day dependency-based accounts of this phenomenon (Sect. 1). The...
متن کامل